Conclusion
Insights and general comments from different tabs
This section of the conclusion will be a summary of the insights from the different tabs. Before we dive into making overall conclusions and suggestions for the improvement of public transportation, it is important to have a summary of the insights obtained from our different analysis and models.
General
From a general perspective, it seems that the performance of public transportation, its usage, and its contamination levels have not improved much in the last 10 years. Furthermore, it seems like it could be improved in many areas to be more efficient and sustainable.
Naive Bayes
Our Naive Bayes model shows that predicting the type of transportation based on the State crossed and its value is not really accurate. While other type of analysis could be done to see if they are related or not with more precision, our model is not able to predict the type of transportation. This seems to be a clear fact that the type of transportation is not related to the State crossed and its value. However, I suggest studying this topic further with other analysis.
On the other hand, Naive Bayes does a great job at predicting the sentiment of reddit posts/comments. While this model could be used to identify the negative reddit posts (or posts in other platforms if checked its accruacy), it could be more effective if utilized with the purpose of identifying the negative posts in order to exteract them and identify the main issues users encounter. That way, the issues could be addressed and solved.
Clustering
Our clustering tab indicates that clusters seem to be present in the census blocks of the US. This is a great finding since we can infer that there are different “types” of census blocks and we could address them in a group manner. This could allow the US to invest in their development as the reduced number of clusters allow the US to invest in the development of the clusters in a more efficient and economic manner. If dealt with them individually, the US would have to invest more money and time in the development of the census blocks, which probably is not possible.
Dimensionality Reduction
Our data set for the census blocks had too many redundant variables, as PCA identified that 25 would explain more than 90% of the variance. This is a great insight as, if inferences made for different clusters are to be made, it would be much more computational efficient to use the PCA variables. FDurthermore, if the inferences are to be extrapolated to other or new census blocks, it would be much more efficient to use the PCA variables to determine their cluster and then make the inferences.
Decision Trees
Decision trees seem to be a great model to predict the average miles per gallon based on city, agency, organization type, Primary UZA population, mode of transportation, and type of service, electric atteries used. Even more effectively, just using Agency, type of service, and mode of transportation seems to be the right choice. Thus, this model could be used to find the optimal combination of these three components in order to choos the most efficient one in every case. That way, we could reduce our carbon footprint while reducing our fossil fuel’s cost.
EDA datasets and other data sections
People seem to care about time and stops the most when writing on reddit about public transportation. This gives us an insight that time is one of the most important things for people and having stops available for when they use public transportation seems to be logical. Thus, we could invest in the development of more stops and more efficient routes to reduce the time people spend on public transportation.
Additionally, another key insight worth highlighting is that transportation has been performing worse over time. However, the energy consumed seems to also have decreased (slowly) overtime. Furthermore, month seems to have an apparent effect on performance, which could be overcome by investing in more resources during the months where performance is lower. Finally, understanding what countries play the biggest role in the manufacturing of public transportation vehicles could be a great insight to understand the impact of public transportation in the world. More advances in R&D could be made to improve the manufacturing of these vehicles and reduce their carbon footprint in these countries.
General Conclusion
As shown in our study, transportation is a key component of society. It not only allows people to move from one place to another but is also needed for the economy to function. Furthermorem, it plays a big role in city and economic development as well as in the environment (accounting for almost 30% of the polution in the US). Thus, it is important to understand how it works and how it can be improved.
One of the biggest takeaways from this study is that census blocks can be grouped since they seem to have similar characteristics. This insight is very revealing as implementing changes or building public transportation models for each place in the US is too time consuming and costly for the US. Thus, the first suggestion to improve public transportation would be to generate twelve (one for each cluster) base transportation models that could be slightly tweaked to fit the more specific needs of each census block. This would allow the US to generate twelve very robust models that would work much better than creating over 220,000 simple models. Furthermore, improvements on each could be implemented in more places just with slight changes, making the process much more efficient.
Another key insight is that the Agency, type of service, and mode of transportation seem to be deterministic in the fuel consumption efficiency. This could be used to find the most optimal combination of these three components in order to choose the most efficient one in every case. That way, we could reduce our carbon footprint while reducing our fossil fuel’s cost.
Incentivizing the use of public transpotation is also very important. That way, we would be able to reduce the amount of cars on the road, reducing the amount of polution and traffic. This could be done by improving the areas that are most important to people. Our Naive Bayes model can do a good job at detecting negative comments from people which then can be analyzed in order to identify the main issues people have with public transportation. With this information, issues can be addressed and solved. Additionally, what seems to be the most important to people is time, and the public transportation performance has actually been decreasing over the past years. We have information that could be used to revert this tendency such as the months that make the reliability of public transportation worse. Thus, we could invest in more resources during the months where performance is lower.
Moreover, understanding what countries play the biggest role in the manufacturing of public transportation vehicles could be a great insight to understand the impact of public transportation in the world. More advances in R&D could be made to improve the manufacturing of these vehicles and reduce their carbon footprint in these countries. Associations with these countries could be done to reach a common goal of reducing the carbon footprint of public transportation while improving its performance.
Finally, it is worth noting that this is a very complex topic and that this analysis only provides an overview of some of the most important aspects of public transportation. There are many other aspects that could be analyzed and that could provide more insights. Further research in fuels, economic factors, and other areas are definitely needed to be able to effectively implement the reccomendations made in this study. However, this analysis provides a good starting point to understand the current situation of public transportation and how it might be improved.